interaction label
Context-Aware Human Behavior Prediction Using Multimodal Large Language Models: Challenges and Insights
Liu, Yuchen, Lerch, Lino, Palmieri, Luigi, Rudenko, Andrey, Koch, Sebastian, Ropinski, Timo, Aiello, Marco
Predicting human behavior in shared environments is crucial for safe and efficient human-robot interaction. Traditional data-driven methods to that end are pre-trained on domain-specific datasets, activity types, and prediction horizons. In contrast, the recent breakthroughs in Large Language Models (LLMs) promise open-ended cross-domain generalization to describe various human activities and make predictions in any context. In particular, Multimodal LLMs (MLLMs) are able to integrate information from various sources, achieving more contextual awareness and improved scene understanding. The difficulty in applying general-purpose MLLMs directly for prediction stems from their limited capacity for processing large input sequences, sensitivity to prompt design, and expensive fine-tuning. In this paper, we present a systematic analysis of applying pre-trained MLLMs for context-aware human behavior prediction. To this end, we introduce a modular multimodal human activity prediction framework that allows us to benchmark various MLLMs, input variations, In-Context Learning (ICL), and autoregressive techniques. Our evaluation indicates that the best-performing framework configuration is able to reach 92.8% semantic similarity and 66.1% exact label accuracy in predicting human behaviors in the target frame.
- Europe > Germany > Baden-Württemberg > Stuttgart Region > Stuttgart (0.04)
- North America > United States > Massachusetts (0.04)
- Europe > Switzerland (0.04)
Predicting and Understanding College Student Mental Health with Interpretable Machine Learning
Chowdhury, Meghna Roy, Xuan, Wei, Sen, Shreyas, Zhao, Yixue, Ding, Yi
Mental health issues among college students have reached critical levels, significantly impacting academic performance and overall wellbeing. Predicting and understanding mental health status among college students is challenging due to three main factors: the necessity for large-scale longitudinal datasets, the prevalence of black-box machine learning models lacking transparency, and the tendency of existing approaches to provide aggregated insights at the population level rather than individualized understanding. To tackle these challenges, this paper presents I-HOPE, the first Interpretable Hierarchical mOdel for Personalized mEntal health prediction. I-HOPE is a two-stage hierarchical model, validated on the College Experience Study, the longest longitudinal mobile sensing dataset. This dataset spans five years and captures data from both pre-pandemic periods and the COVID-19 pandemic. I-HOPE connects raw behavioral features to mental health status through five defined behavioral categories as interaction labels. This approach achieves a prediction accuracy of 91%, significantly surpassing the 60-70% accuracy of baseline methods. In addition, our model distills complex patterns into interpretable and individualized insights, enabling the future development of tailored interventions and improving mental health support. The code is available at https://github.com/roycmeghna/I-HOPE.
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- North America > United States > New York > New York County > New York City (0.05)
- North America > United States > Indiana > Tippecanoe County > West Lafayette (0.04)
- (6 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.68)
- Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)
- Education > Educational Setting > Higher Education (1.00)
Domain Knowledge Driven Pseudo Labels for Interpretable Goal-Conditioned Interactive Trajectory Prediction
Sun, Lingfeng, Tang, Chen, Niu, Yaru, Sachdeva, Enna, Choi, Chiho, Misu, Teruhisa, Tomizuka, Masayoshi, Zhan, Wei
Motion forecasting in highly interactive scenarios is a challenging problem in autonomous driving. In such scenarios, we need to accurately predict the joint behavior of interacting agents to ensure the safe and efficient navigation of autonomous vehicles. Recently, goal-conditioned methods have gained increasing attention due to their advantage in performance and their ability to capture the multimodality in trajectory distribution. In this work, we study the joint trajectory prediction problem with the goal-conditioned framework. In particular, we introduce a conditional-variational-autoencoder-based (CVAE) model to explicitly encode different interaction modes into the latent space. However, we discover that the vanilla model suffers from posterior collapse and cannot induce an informative latent space as desired. To address these issues, we propose a novel approach to avoid KL vanishing and induce an interpretable interactive latent space with pseudo labels. The proposed pseudo labels allow us to incorporate domain knowledge on interaction in a flexible manner. We motivate the proposed method using an illustrative toy example. In addition, we validate our framework on the Waymo Open Motion Dataset with both quantitative and qualitative evaluations.